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. Abstract 

' We consider the problem of choosing a density estimate from a set of distribu- 

te , tions minimizing the Li-distance to an unknown distribution ( [DLOlj ). Devroye 

00 ' and Lugosi [DLOlj analyze two algorithms for the problem: Scheffe tournament win- 

ner and minimum distance estimate. The Scheffe tournament estimate requires fewer 
computations than the minimum distance estimate, but has strictly weaker guarantees 
than the latter. 

We focus on the computational aspect of density estimation. We present two 
algorithms, both with the same guarantee as the minimum distance estimate. The 
first one, a modification of the minimum distance estimate, uses the same number 
(quadratic in of computations as the Scheffe tournament. The second one, called 
"efficient minimum loss-weight estimate," uses only a linear number of computations, 
assuming that T is preprocessed. 
' We also give examples showing that the guarantees of the algorithms cannot be 

\Q , improved and explore randomized algorithms for density estimation. 

00 
<N _ 

(sj '. 1 Introduction 

t-h ; 

We study the following density estimation problem considered in [DL96j IDLOll IDGL02] . 
There is an unknown distribution g and we are given n (not necessarily independent) 
samples which define empirical distribution h. Given a finite class T of distributions, our 
^ ■ objective is to output / G J- such that the error ||/ — <?||i is minimized. The use of the 

Li-norm is well justified by it has many useful properties, for example, scale invariance 
and the fact that approximate identification of a distribution in the Li-norm gives an 
estimate for the probability of every event. 

The following two parameters influence the error of a possible estimate: the distance of 
g from T and the empirical error. The first parameter is required since we have no control 
over T, and hence we cannot select a distribution which is better than the "optimal" 
distribution in T, that is, the one closest to g in Li-norm. It is not obvious how to define 
the second parameter — the error of h with respect to g. We follow the definition of [DLOlj . 
which is inspired by [Yat85j (see Section 11.11 for a precise definition) . 

Devroye and Lugosi [DLOlj analyze two algorithms in this setting: Scheffe tourna- 
ment winner and minimum distance estimate. The minimum distance estimate, defined 
by Yatracos [Yat85j . is a special case of the minimum distance principle, formalized by 
Wolfowitz in |Wol57j . The minimum distance estimate is a helpful tool, for example, it 
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was used by [DL961 IDL97j to obtain estimates for the smoothing factor for kernel density 
estimates and also by |DGL02j for hypothesis testing. 

The Scheffe tournament winner algorithm requires fewer computations than the min- 
imum distance estimate, but it has strictly weaker guarantees (in terms of the two pa- 
rameters mentioned above) than the latter. Our main contribution are two procedures for 
selecting an estimate from J 7 , both of which have the same guarantees as the minimum 
distance estimate, but are computationally more efficient. The first has a quadratic (in 
\J-\) cost, matching the cost of the Scheffe tournament winner algorithm. The second one 
is even faster, using linearly many (in |.F|) computations (after preprocessing T). 

Now we outline the rest of the paper. In Section 11.11 we give the required definitions 
and introduce the notion of a test- function (a variant of Scheffe set). Then, in Section fl .21 
we restate the previous density estimation algorithms (Scheffe tournament winner and 
the minimum distance estimate) using test-functions. Next, in Section [21 we present 
our algorithms. The first one is a modification of the minimum-distance estimate with 
improved (quadratic in \T\) computational cost. The second one, which we call "efficient 
minimum loss- weight estimate," has only linear computational cost after preprocessing T . 
In Section [3] we explore randomized density estimation algorithms. In the final Section [H 
we give examples showing tightness of the theorems stated in the previous sections. 

Throughout this paper we focus on the case when T is finite, in order to compare the 
computational costs of our estimates to previous ones. However our results generalize in 
a straightforward way to infinite classes as well if we ignore computational complexity. 

1.1 Definitions and Notations 

Throughout the paper g will be the unknown distribution and h will be the empirical 
distribution. Let J 7 be a set of distributions. We will assume that T is finite (the results 
generalize straightforwardly to infinite sets of distributions). Let di(g, J 7 ) be the L\- 
distance of g from J 7 , that is, miny g j?r ||/ — g\\i. 

Given two functions fi, fj on 0, (in this context, distributions) we define a test-function 
Tij : $7 — > {—1,0, 1} to be the function Tij(x) = sgn(/j(x) — fj(x)). Note that Ty = —Tji. 
We also define Tjc- to be the set of all test- functions for J 7 , that is, 

Tr = {T t3 | fi, fj £ J 7 }. 

Let • be the inner product for the functions on O. Note that 

(f i -f j ).T ij = \\f i -f j \\ 1 . 

We use the inner product of the empirical distribution h with the test-functions to choose 
an estimate, which is a distribution from T . 

In this paper we only consider algorithms which make their decisions purely on inner 
products of the test-functions with h and members of T . It is reasonable to assume that 
the computation of the inner product will take significant time. Hence we measure the 
computational cost of an algorithm is by the number of inner products used. 

We say that fi wins against fj if 

(fi — h) ■ < (fj — h) - Tji. (1) 
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Note that either fi wins against /,-, or wins against fa, or there is a draw (that is, there 
is equality in (pQ)). 

The algorithms choose an estimate / € J- using the empirical distribution h. The 
Li-distance of the estimates from the unknown distribution g will depend on the following 
measure of distance between the empirical and the unknown distribution: 



A := maxfq — h) ■ T. 



(2) 



Now we discuss how test-functions can be viewed as a reformulation of Scheffe sets, 
defined by Devroye and Lugosi [DLOlj (inspired by [Sch47| and implicit in |Yat85] ) , as 
follows. The Scheffe set of distributions fi, fj is 



A 



{x ; fi(x) > fj(x)}. 



Devroye and Lugosi say that fa wins against if 



f fi-HAj) < [ f : 



h{Aij) 



(3) 



The advantage of using Scheffe sets is that for a concrete set T of dist ributions one can 
immediately use the theory of Vapnik-Chervonenkis dimension |VC71 for the family of 
Scheffe sets of T (this family is called the Yatracos class of J 7 ), to obtain a bound on the 
empirical error. 

If h, fi, fj are distributions then the condition ([I]) is equivalent to ([3]) (to see this recall 
that Tij = —Tji, and add (fi — h) ■ 1 = (h — fj) ■ 1 to (pQ), where 1 is the vector of all 
ones). Thus, in our algorithms the test-functions can be replaced by Scheffe sets and VC 
dimension arguments can be applied. 

We chose to use test-functions for two reasons: first, they allow us to give succinct 
proofs of our theorems (especially Theorem [7]), and second, they immediately extend to 
the case when the members of T are not distributions (cf, e. g., Exercise 6.2, in |DL01] ). 

Remark 1. Note that our value of A, defined in terms of Tjr, is at most twice the A used 
in [DLOlj . which is defined in terms of Scheffe sets. 



1.2 Previous Estimates 

In this section we restate the two algorithms for density estimation from Chapter 6 of 
[DLOlj ) using test-functions. The first algorithm requires less computation but has worse 
guarantees than the second algorithm. 



Algorithm 1 - SCHEFFE TOURNAMENT WINNER. 

Output / £ J- with the most wins (tie broken arbitrarily). 



Theorem 2 ( [DLOlj . Theorem 6.2). Let f\ £ T be the distribution output by Algorithm 1. 
Then 

ll/i-slli <9di(p,^) + 8A. 
The number of inner products used by Algorithm 1 is Sd-T 7 ! 2 ). 



3 



Algorithm 2 - Minimum distance estimate. 
Output / € J- that minimizes 

max{|(/-fr)-r tJ | ; fufjZF}. (4| 

Theorem 3 ( |DL01j . Theorem 6.3). Let f\ be the distribution output by Algorithm 2. 
Then 

ll/i-tflli <3d!( 5 ,JP) + 2A. 
The number of inner products used by Algorithm 2 is 0(|J-"| 3 ). 

Let us point out that Theorems 6.2 and 6.3 in [DL01] require that each / £ J- is 
a distribution, that is, J f = 1. Since we use test-functions in the algorithms instead of 
Scheffe set based comparisons, the assumption j f = 1 is not actually needed in the proofs 
of Theorems 6.2 and 6.3 (we skip the proof), and is not used in the proofs of Theorems[U[7| 

2 Our estimators 

2.1 A variant of the minimum distance estimate 

The following modified minimum distance estimate uses only 0(|.F| 2 ) computations as 
compared to 0(|J-"| 3 ) computations used by Algorithm 2 (equation ([5]) takes minimum of 
0(|.F|) terms, whereas equation Q takes minimum of 0(|JP"| 2 ) terms), but as we show in 
Theorem HI it gives us the same guarantee as the minimum distance estimate. 

Algorithm 3 - Modified MINIMUM DISTANCE ESTIMATE. 
Output fi E J- that minimizes 

max { \(fi - h) ■ Tij\ ; fj eJ 7 }. (5) 
Theorem 4. Let fx £ T be the distribution output by Algorithm 3. Then 

H/i-tflli <3di( 5 ,^)+2A. 
The number of inner products used by Algorithm 3 is 0(|^-"| 2 ). 
Proof : 

Let /i G T be the function output by Algorithm 3. Let ji = argminj g:F ||/ — g\\i. By the 
triangle inequality we have 

ll/i-slli < ll/i- Mil + 1|/ 2 -slli- (6) 

We bound ||/i — /2II1 as follows: 

ll/i - /2II1 = (/1 - h) ■ T 12 <\(h-h)- ria| + K/2 - h) ■ ri 2 | 
< \(fi-h)-T 12 \ + max\(f 2 -h)-T 2:j \, 

where in the last inequality we used the fact that T12 = — T21. 
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By the criteria of selecting f\ we have |(/i — h) ■ T\ 2 \ < maxj jg jr K/2 — h) ■ T2.il ( srnce 
otherwise /2 would be selected). Hence 

||/i - /2II1 < 2max \{f 2 - h) ■ T 2 j\ < 2max \(f 2 - g) ■ T 2J \ + 2 max \(g - h) ■ T 2 j\ 
fj^-F Sj^F fj^-F 

< 2||(/ 2 - g)\\i + 2 max \(g - h) ■ T\ = 2||/ 2 - g\\ x + 2A. 

Combining the last inequality with (J6J) we obtain 

<3||/ 2 -5l|i+2A. 



Remark 5. Note that one can modify the Lemma to only require that g and h be "close" 
with respect to the test functions for the "best" function in the class, that is, only \(g — 
h) ■ T 2 j\ need to be small (where f 2 is argminj g jr||/ — g\\i). 

One can ask whether the observation in Remark [5] can lead to improved density es- 
timation algorithms for concrete sets of distributions. The bounds on A (which is given 
by ([2])) are often based on the VC-dimension of the Yatracos class of T . Recall that the 
Yatracos class Y is the set of Aij = {x ; fi{x) > fj(x)} for all fi, fj £ T . Remark [5] implies 
that instead of the Yatracos class it is enough to consider the set Yi = {A^ ; fj G T} for 
fi € T . Is it possible that the VC-dimension of each set Yi is smaller the VC-dimension 
of the Yatracos class YI The following (artificial) example shows that this can, indeed, 
be the case. Let O = {0, . . . , n}. For each {n + l)-bit binary string ao, a\, . . . , a n , let us 
consider the distribution 

P(k) = ^(1 + (1/2 - ao)(l/2 - a fc ))2-^=i°^', 

for k £ {1, . . . ,n} (with P(0) chosen to make P into a distribution). For this family of 
2 n+1 distributions the VC-dimension of the Yatracos class is n, whereas each Yi has VC- 
dimension 1 (since a pair of distributions fi, fj has a non-trivial set Aij if and only if their 
binary strings differ only in the first bit). 

2.2 An even more efficient estimator - minimum loss-weight 

In this section we present an estimator which, after preprocessing uses only 0(|J-"|) 
inner products to obtain a density estimate. The guarantees of the estimate are the same 
as for Algorithms 2 and 3. 

The algorithm uses the following quantity to choose the estimate: 

loss- weight (/) = max { ||/ — /'||i ; / does not win against /' G T }. 

Intuitively a good estimate should have small loss-weight (ideally the loss-weight of 
the estimate would be —00 = max{}, that is, the estimate would not lose at all). Thus 
the following algorithm would be a natural candidate for a good density estimator (and, 
indeed, it has a guarantee matching Algorithms 2 and 3), but, unfortunately, we do not 
know how to implement it using 0(1^"!) inner products. 
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Algorithm 4a - Minimum loss-weight estimate. 
Output / £ J- that minimizes loss-weight(/). 

The next algorithm, seems less natural than algorithm 4a, but its condition can be imple- 
mented using only 0(1^"!) inner products. 

Algorithm 4b - EFFICIENT MINIMUM LOSS-WEIGHT ESTIMATE. 
Output / € J- such that for every /' to which / loses we have 

11/ -/Hi < loss-weight (/). (7)_ 

Before we delve into the proof of ([8]) let us see how Algorithm 4b can be made to use 
|^-"| — 1 inner products. We preprocess J- by computing Li-distances between all pairs 
of distributions in T and store the distances in an list sorted in decreasing order. When 
the algorithm is presented with the empirical distribution h, all it needs to do is perform 
comparison between select pairs of distributions. The advantage is that we preprocess 
T only once and, for each new empirical distribution we only compute inner products 
necessary for the comparisons. 

We will compute the estimate as follows. 





input : family of distributions J 7 , list L of all pairs {fi, fj} sorted 


in decreasing 




order by \\fi — fj\\i, oracle for computing inner products h 


• Tij . 




output : / eJ 7 such that: (V/') / loses to /' ||/ - /'||i < loss- 


weight(/')- 


1 


es < — J~ 




2 


repeat 




3 


pick the first edge {/i, fj} in L 




4 


if fi loses to fj then /' <— fi else /' <— fj fi 




5 


remove /' from 5* 




6 


remove pairs containing /' from L 




7 


until |5| = 1 




8 


output the distribution in 5* 





Algorithm 4b - using Od^l) inner products. 



Note that while Algorithm 4b uses only 0(|^-"|) inner products its running time is 
actually 6(|.F| 2 ), since it traverses a list of length 0(|.F| 2 ). If we are willing to spend 
exponential time for the preprocessing then we can build the complete decision tree cor- 
responding to Algorithm 4b and obtain a linear-time density selection procedure. Is it 
possible to achieve linear running time using only polynomial-time preprocessing? 

Question 6 (Tournament Revelation Problem). We are given a weighted undirected com- 
plete graph on n vertices. Assume that the edge-weights are distinct. We preprocess the 
weighted graph and then play the following game with an adversary until only one vertex 
remains: we report the edge with the largest weight and the adversary chooses one of the 
endpoints of the edge and removes it from the graph (together with all the adjacent edges). 

Our goal is to make the computational cost during the game linear-time (in n) in the 
worst-case (over the adversary's moves). Is it possible to achieve this goal with polynomial- 
time preprocessing? 

We now show that estimate / output by algorithm 4b satisfies ([7]) for every /' against 
which / loses. We show, using induction, that the following invariant is always satisfied 
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on line 2. For any / £ S and any /' £ J r \ S we have that if / loses to /' then ||/ — /'||i < 
loss-weight(/'). Initially, T \ S is empty and the invariant is trivially true. For the 
inductive step, let /' be the distribution most recently removed from S. To prove the 
induction step we only need to show that for every / £ S we have that if / loses to /' 
then ||/ — < loss-weight (/'). Let W be the L 4 -distance between two distributions in 
Su{f'}. Then loss-weight (/') > W (since /' lost), and ||/ - /'||i < W (by the definition 
of W). 

Theorem 7. Let f\ £ T be the distribution output by Algorithm 4 a (or Algorithm 4b). 
Then 

\\fi-9\h <3di( 5 ,^) + 2A. (8) 
Assume that we are given L\-distances between every pair in T . The number of inner 
products used by Algorithm 4b is 0(|.F|). 

Proof of Theorem 

Let / 4 = g. Let / 2 be the function / £ T minimizing \\g — /||i. We can reformulate our 
goal ([8]) as follows: 

(/i-/4)-T 14 <2A + 3(/ 2 -/4)-T 24 . (9) 

Let /3 £ T be the function /' £ T such that / 2 loses against and ||/ 2 — /'||i is maximal. 
Note that /i, / 2 , £ J 7 , but / 4 does need to be in T . 

We know that / 2 loses against fs, that is, we have (see ([T])) 

2h-T 23 </ 2 -T 2 3 + /3-T 23 , (10) 

and, since /i minimized the maximum loss, we also have 

(/i-/ 2 )-T 12 < (/ 2 -/ 3 ).T 23 . (11) 

By ([2]) we have 

2(/ 4 - /i) • T 23 < 2A. (12) 
Adding (TTOD , (ffH)> and <H2]) we obtain 

2(/ 2 - / 4 ) ' T 23 + (/ 2 - /i) • T 12 + 2A > 0. (13) 
Note that for any k,£ we have: 

(fi ~ fj) ■ {Tij - T U ) > 0, (14) 

since if fi(x) > fj(x) then - T ki > 0, if fi(x) < fj(x) then - T ki < 0, and if 
fi{x) = fj(x) then the contribution of that x is zero. By applying (fTil) four times we 
obtain 

(/ 2 - / 4 ) • (3T 24 - 2T 23 - T 14 ) + (h - / 2 ) • (T 12 - T 14 ) > 0. (15) 

Finally, adding (TTSD and (USD yields ©. ■ 

Remark 8. Note that Remark [5] also applies to Algorithms 4a and 4b, since (|12p is the 
only inequality in which A is used. 

Remark 9. If the condition (J7J) of Algorithm 4b is relaxed to 

11/ - /'Hi <C- loss-weight^'), (16) 
for some C > 1, one can prove an analogue of Theorem [7] with ([8]) replaced by 

||/i-5||i < (l + 2C)d 1 ( 5 ,^)+2CA. (17) 
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3 Randomized algorithm and mixtures 



In this section we explore the following question: can constant 3 be improved if we allow 
randomized algorithms? Let / be the output of a randomized algorithm (/ is a random 
variable with values in J-). We would like to bound the expected error E[||/ — g\\x] . 

If instead of randomization we consider algorithms which output mixtures of distribu- 
tions in J- we obtain a related problem. Indeed, let a be the distribution on T produced 
by a randomized algorithm, and let r = ^2 s£ jrCt s s be the corresponding mixture. Then, 
by triangle inequality, we have 

\\r-g\\i <V[\\f-g\\i]. 

Hence the model in which the output is allowed to be a mixture of distributions in T is 
"easier" than the model in which the density selection algorithm is randomized. 

We consider here only the special case in which T has only two distributions fx,fi, and 
give an randomized algorithm with a better guarantee than is possible for deterministic 
algorithms. Later, in Section [H we give a matching lower bound in the mixture model. 

To simplify the exposition we will, without loss of generality, assume that ||/i — /2II1 > 
0. Thus for any h we have (fx — h) ■ Tx 2 + (h — f 2 ) ■ T\i = ||/i — /2II1 > 0. 



Algorithm 5 - Randomized estimate. 
Let 

\(h-h)-T l2 \ 



\{f2-h)-T 12 [ 

With probability l/(r + 1) output fx, otherwise output f 2 . 



(By convention, if | (f 2 — h) ■ T\ 2 | = then we take r = 00 and output f 2 with probability 1) . 

Theorem 10. Let T = {fx,f2}- Let f € T be the distribution output by Algorithm 5. 
Then 



E 



1 



<2dx{g,F) + A. 



Proof : 

Without loss of generality assume that f 2 = argminj g _^-||/ — g\\x- First we bound the error 
of fx and later use it to bound the error of /. We have, by triangle inequality, 

\\h-g\\i < II/1-/2II1 + II/2 -slli- 

We can bound — /2II1 as follows 

||/i - /2II1 = (/1 - / 2 ) • lb < |(/i - h) ■ T 12 \ + \(f 2 -h)- T 12 \ 
= (r + l)|(/ 2 -h)-T 12 \<(r+ l)|(/ 2 - g) ■ T 12 \ + (r + l)\(g - h) ■ T 12 \. 



Thus 
Hence 



E 



\f-g\ 



\\fx-g\\x < (r + 2)11/2-311! + ^ + ^. 
1 



(18) 



I/1-5II1 + 



r + 1" " r+1 

where in the last inequality we used (|18p . 



/2-5II1 <2||/ 2 - 5 ||i + A, 



S 



4 Lower bound examples 



In this section we construct an example showing that deterministic distribution selection 
algorithms based on test-functions cannot improve on the constant 3, that is, Theorems El 
El HI [7] are tight. For algorithms that output mixtures (and hence randomized algorithms) 
the example yields a lower bound of 2, matching the constant in Theorem [TUJ 

Lemma 11. For every s' > there exist distributions fi-, fi, and g = h such that 

\\h-g\\i > (3- £ ')ll/2-5lli, 

and fi ■ T12 = -fi ■ T\2 and h-Tn = 0. 

Before we prove Lemma [11] let us see how it is applied. Consider the behavior of the 
algorithm on empirical distribution h for T = {/i,/2} and F' = {f[,fi}, where f[ = fi 
and f 2 = fx- Note that T[ 2 = Ti\ = —T\2 and hence 

1 • 1 12 — ~J2 " J 12 = Jl - J-12 = —J2 ■ J12- 

Moreover, we have h ■ T\2 = h ■ T[ 2 = 0. Note that all the test-functions have the same 
value for T and J-'. Hence a test-function based algorithm either outputs f\ and f[, or it 
outputs f2 and = fx- In both cases it outputs /1 for one of the inputs and hence we 
obtain the following consequence. 

Corollary 12. For any e > and any deterministic test-function based algorithm there 
exist an input T and h = g such that the output f\ of the algorithm satisfies \\f\ — g\\i > 
(3-e)d 1 (g,T). 

Proof of Lemma lilt 

Consider the following probability space consisting of of 4 atomic events A\ , A2 , A3 , A4 : 





Ax 


A 2 


^3 


A 4 


fl 





1/4 + e 


1/2 


1/4 -e 


h 


1/2 + e 


1/4 -e 





1/4 


g = h 


1/2 


1/2 








Ti 2 


-1 


1 


1 


-1 



Note that we have f\ ■ T\2 = — fi ■ T\2 = \ + 2e, and ||/i — g\\± = | — 2e, H/2 — g\\i = \ + e. 
The ratio — g||i/||/2 — #||i gets arbitrarily close to 3 as e goes to zero. ■ 

Consider f\ and fi from the proof of Lemma [TT1 Let / = afi + (1 — a)/2 where 
a > 1/2. For < e < 1/4 we have ||/ — g\\\ = 1/2 + a — 2ea > 1 — 2e. By symmetry, for 
one of T = /2} and T' = {/{, f^} (with /{ = fi and f 2 = fi), the algorithm outputs 
afi + (1 — a) fi with a > 1/2, and hence we obtain the following. 

Corollary 13. For any £ > and any deterministic test-function based algorithm which 
outputs a mixture there exist an input T and h = g such that the output f of the algorithm 
satisfies \\f - g\\i > (2 - e)d 1 (g,J r ). 

Thus for two distributions the correct constant is 2 for randomized algorithms using 
test-functions. For larger families of distributions we do not know what the value of the 
constant is (we only know that it is from the interval [2, 3]). 
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Question 14. What is the correct constant for deterministic test-function based algorithm 
which output a mixture? What is the correct constant for randomized test-function based 
algorithms ? 

Next we construct an example showing that 9 is the right constant for Algorithm 1. 

Lemma 15. For every e' > there exist probability distributions fi, f 2 , f 3 = f 3 and g 
such that 

||/i-<?||i>(9-<)ll/2-<7||i, 

yet the Algorithm 1, for T = {/i, f 2 , /3, f 3 }, even when given the true distribution (that 
is, h = g) outputs f\ . 

Proof : 

Consider the following probability space with 6 events A\, . . . , Aq and f\ , f 2 and g with 
the probabilities given by the following table: 





A! 


A 2 


^3 


A 4 


A 5 


A 6 


g = h 


2/3 - 21e 


1/9 - 2e 


9e 





2/9 + 14e 





fi 





18e 


2/3 - 12e 


2/9 - 13e 


9e 


1/9 - 2e 


h 


2/3 - 30e 











2/9 + Ue 


l/9 + 16e 


h 


2/3 - 21e 


9e 


9e 


2/9 -4e 





1/9 + 7e 


T 12 


-1 


1 


1 


1 


-1 


-1 




-1 


1 


1 


-1 


1 


-1 


T 23 


-1 


-1 


-1 


-1 


1 


1 



Note that we have 

fi ■ Tia = 7/9 - 14e, /i • Ti2 = -7/9 + 14s, f 2 ■ T 12 = -1, 
/i • Ti 3 = 1/3 + 30x, fc-T 13 = -l/3 + 42x, / 3 • T 13 = -1 + 36x, 
/2-T 23 = -1/3 + 60s, h ■ T 23 = -5/9 + 28x, f 3 ■ T 23 = -7/9 + 14s. 

Hence f\ wins over / 3 , / 3 wins over /2, and f 2 wins over /i. Since / 3 = we have that f\ 
is the tournament winner . Finally, we have 11/1 — 5111 = 2 — 72e and \\f 2 — g\\\ = 2/9 + 32e. 
As e — > the ratio ||/i — 3H1/H/2 — 5||i gets arbitrarily close to 9. ■ 
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